# Replication Files for _Moral Foundation Measurements Fail to Converge on Multilingual Party Manifestos_


## List of steps needed to take before the replication scripts can be run

### Manifesto Data
Obtain an API key for the Manifesto Project Database by registering at (https://manifesto-project.wzb.eu/), and set it as the environment variable `manifesto_project_key`. You can either do so after running the Docker container, or specify it in the Dockerfile already.

### FastText Embeddings
Obtain fasttext embeddings for English, German, Spanish and Dutch from  
*Wirsching EM, Rodriguez PL, Spirling A, Stewart BM. Multilanguage Word Embeddings for Social Scientists: Estimation, Inference, and Validation Resources for 157 Languages. Political Analysis. Published online 2024:1-8. doi:10.1017/pan.2024.17.*
You can find them here: (https://alcembeddings.org/). Download the four models named `fasttext_model_enwiki.bin` (replace for each language). Place them in the folder `data/ddr/embeddings`.

### Moral Foundation Dictionaries
Download translated Moral Foundation Dictionaries from the following sources. Each should be formatted as the English example, and the file named `mfd_en.dic`, saved in the `data/mfd` folder.
- English from https://github.com/kbenoit/quanteda.dictionaries/blob/master/sources/MFD/moral_foundations_dictionary.dic
- German and Dutch from the Supplementary Materials from *Bos, L. and Minihold, S. (2022), The Ideological Predictors of Moral Appeals by European Political Elites; An Exploration of the Use of Moral Rhetoric in Multiparty Systems. Political Psychology, 43: 45-63. https://doi.org/10.1111/pops.12739*
- Spanish from *Carvalho, F., & Guedes, G. (2022). Dicionário de Fundamentos Morais em Espanhol. In M. Nussbaum, C. Infante, & J. Sánchez (Eds.), Nuevas ideas en informática educativa, volumen 16 (pp. 287–291). Universidad de Chile. https://www.tise.cl/Volumen16/Short%20Paper/TISE_2022_paper_11.pdf*. GitHub Repo at: (https://github.com/LaCAfe/mfd-es)

### Moral Foundations Questionaire 
Fill in the three empty columns `sentence1` etc. in the .csv file in the folder `data/ccr/` by downloading the Moral Foundations Questionaires from (https://moralfoundations.org/questionnaires/). If it is easier to work with a spreadsheet software such as LibreCalc, then use that and export to .csv, making sure that you properply escape and format to .csv.

### Moral Foundations Vignettes
Similar to the Questionaire, fill in the `data/ccr/vignettes.csv` file by download the relevant vignettes and dataset and extracting the wordings of the vignettes, this time it is one vignette per line.
- English vignettes from *Clifford, S., Iyengar, V., Cabeza, R. et al. Moral foundations vignettes: a standardized stimulus database of scenarios based on moral foundations theory. Behav Res 47, 1178–1198 (2015). https://doi.org/10.3758/s13428-014-0551-2*
    - It doesn't seem like the authors actually shared the (English) Vignettes in that article, so we'd recommend using the Dutch data instead, since in Spanish only a selection of vignettes was translated.
- Spanish vignettes from *Aguiar, F., Corradi, G. & Aguilar, P. Ageing and disgust: Is old age associated with harsher moral judgements?. Curr Psychol 42, 8460–8470 (2023). https://doi.org/10.1007/s12144-022-03423-1*
- Dutch vignettes from *Hopp, F. R., Jargow, B., Kouwen, E., & Bakker, B. N. (2024). The Dutch moral foundations stimulus database: An adaptation and validation of moral vignettes and sociomoral images in a Dutch sample. Judgment and Decision Making, 19. https://doi.org/10.1017/jdm.2024.5* 
- German vignettes, based on the authors' translations, are provided in the `data/ccr/vignettes_examples.csv` file.

### PartyFacts, ParlGov, CHES

**PartyFacts**
Download the linking dataset of party identifiers (external parties) from PartyFacts (https://partyfacts.herokuapp.com/download/) and put it in `data/external`. 

**ParlGov**
Download the cabinet make-up data of ParlGov (view cabinet) from ParlGov (https://parlgov.fly.dev/data/) as a csv_file, and put it in `data/external/`.

**CHES** 
Download the 1999-2019 CHES trend file from CHES (https://www.chesdata.eu/ches-europe) as a csv file and put it into `data/external/` (and rename it ches.csv, I was too afraid of their csv naming conventions).

# Running it all

For the ease of development, and also more cleanly separating things, the replication consists of two Docker images. The base one only installs Python and R dependencies, and creates a virtual environment. The other one above actually copies over the folder of embeddings and external inputs, as well as the scripts. The build is based on Ubuntu 24.10. Building the base-image takes significantly longer (20 minutes) than the script-image (1 minute).
Note that to utilise GPUs, which significantly speeds up the MoralBERT process in particular, you will need to download the NVIDIA Container Toolkit and set it up on your particular machine. Otherwise, run the container without the --gpus all flag.
````
docker build -f Dockerfile.base -t base-image .
docker build -f Dockerfile.scripts -t script-image .
docker run  --gpus all -it -v "$(pwd)/data:/app/data" script-image 
````

We've run it using a system with a _AMD Ryzen 7 5800X 8-Core Processor_, _64GB of RAM_ and a _Nvidia RTX A4000_, which takes us around 24h. All of the data gathering and wrangling scripts each take a maxium of a few minutes, whereas the DDR scoring takes roughly 2h, the CCR 3.5 hours, and each dimension and foundation with MoralBERT 1:45h, adding up to 18 hours roughly.

Two log files are attached (in the `/data/`) folder. One contains all output, sadly very verbose because progress bars are included (I tried, and failed, to bash-filter them out, or redirect them), unaltered, and the other one has been edited to remove all interim progress bars. The latter is therefore more human readable, the former technically fully complete.

## Citations
The list of seed words used for DDR contains the English terms used by: *Garten, J., Hoover, J., Johnson, K.M. et al. Dictionaries and distributions: Combining expert knowledge and large scale textual data content analysis. Behav Res 50, 344–361 (2018). https://doi.org/10.3758/s13428-017-0875-9*

### 
A note on the requirements file: gensim 4.3.3. contains a bug fix for scipy; this requires an older numpy version (> 2), which breaks the default thinc version that would be installed together with spacy. Downgrading thinc to the lowest allowed version from spacy allows everything to still work together.
